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The paper is about a class of languages that extends context-free languages (CFL) and is stable un- 
der shuffle. Specifically, we investigate the class of partially-commutative context-free languages 
(pcCFL), where non-terminal symbols are commutative according to a binary independence relation, 
very much like in trace theory. The class has been recently proposed as a robust class subsuming CFL 
and commutative CFL. This paper surveys properties of pcCFL. We identify a natural corresponding 
automaton model: stateless multi-pushdown automata. We show stability of the class under natural 
operations, including homomorphic images and shuffle. Finally, we relate expressiveness of pcCFL 
to two other relevant classes: CFL extended with shuffle and trace-closures of CFL. Among tech- 
nical contributions of the paper are pumping lemmas, as an elegant completion of known pumping 
properties of regular languages, CFL and commutative CFL. 

1 Introduction 

Closure of languages classes under shuffle is intensively investigated, see for instance [3] and further 
references therein. This paper is about a subtle way of introducing shuffle into context-free grammars. 

Process algebraic motivation. In the context of infinite- state verification there are two basic well 
known classes of systems. Context-free processes, called traditionally BPAP][2], stand for the most fun- 
damental abstract model of sequential recursive programs. BPA contains configuration graphs induced 
by context-free grammars in Greibach normal form. The commutative variant, commutative context-free 
processes, traditionally called BPFQ was proposed in Q as the abstract model of concurrent programs. 
BPP differs from BPA in that it has parallel composition instead of sequential composition. Thus a 
configuration is a finite multiset of non-terminals rather than a sequence. 

A natural generalization of both BPA and BPP is Process Algebra (PA) Q where one allows for both 
kinds of composition^] A standard reference for a process-rewrite formulation of PA is |[T2ll . However, 
PA does not seem to have good algorithmic properties. For instance, bisimulation equivalence is not 
known to be decidable, a long standing open problem lfT"5l . while the algorithm for normed PA is very 
complex and as costly as double exponential time iTTTll . This has recently motivated investigation of 
an alternative but equally natural generalization of both BPA and BPP, namely partially-commutative 
context-free processes, called BPCQin gj. BPC processes are also defined by a Greibach grammar, but 
one additionally assumes a binary independence relation among non-terminals, like in trace theory 021, 
and only independent pairs of non-terminals commute. We stress that the independence is imposed not on 
alphabet letters, which is usually the case in trace languages, but on non-terminals. Thus a configuration 

*The first author acknowledges a partial support by the Polish MNiSW grant N N206 568640. 
^The second author acknowledges a partial support by the Polish MNiSW grant N N206 356036. 
'A shorthand for Basic Process Algebra. 

2 A shorthand for Basic Parallel Processes Algebra. 

3 The algebra of |2) includes also left merge operation, not considered in this paper. 
4 A shorthand for Basic Partially Commutative Algebra. 
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may be modeled as a trace over non-terminals. BPA is a special case where no non-terminals commute 
while BPP, on the other hand, is another special case where all non-terminals commute. 

In [|6j [7] an efficient polynomial-time procedure has been developed for bisimulation equivalence, 
that works correctly in the subclass of normed BPC that strictly contains both normed BPA and BPP. 
We also very recently analyzed the reachability problem for BPC lH. In this paper we continue the pro- 
gram that aims at finding a robust class subsuming BPA and BPP, however this time from the language- 
theoretic perspective. 

Language theoretic motivation. BPA clearly defines context-free languages (CFL) and BPP defines 
so called commutative context-free languages (<-CFL) [50 equivalently characterized as languages of 
communication-free Petri nets. In this paper we focus on partially-commutative context-free languages 
(pcCFL) [7j that are defined by BPC. Our aim is to investigate properties of this class and to relate its 
expressiveness with other classes. 

The class P cCFL extends CFL and is closed under the shuffle operation. By a shuffle of two words 
we mean here an arbitrary interleaving of these words and by shuffle of two languages we mean all 
shuffles of all pairs of words from the two languages. Other similar extensions of CFL may be found 
in the literature. One such extension is PA languages. We use a shorthand shuaeCFL for this class - as 
far as languages are concerned, PA is equivalent to context-free grammars where one allows to use both 
concatenation and shuffle in productions lfl0l[T4l . Another related class is trace-closures of CFL (name 
this class trace CFL), where one assumes, contrary to P cCFL, an independence relation on alphabet letters. 
We have found it appealing to relate the expressive power of pcCFL with shuaeCFL and trace CFL. 

Our contribution. First, we show that a relevant subclass of P cCFL, subject to the restriction that 
the complement of independence relation is transitive, has a natural corresponding automaton model: 
stateless multi-pushdown automata (Section|2]). We also prove that the membership problem for P cCFL is 
NP-complete thus the complexity remains the same as for cCFL (Section|3]). 

Second, in Section [4] we investigate stability of pcCFL under natural operations. In particular, P cCFL 
turns out to be stable under homomorphic images, substitutions and shuffle. On the other hand, the class 
is not stable under inverse homomorphic images and under intersections with regular languages. The 
latter is not very surprising as we consider a natural extension of cCFL, the class that lacks not only 
the two closure properties, but even lacks closure under concatenation and homomorphic images! With 
pcCFL one regains closure under concatenation and homomorphic images. 

Third, in Sections [6] and [7J we perform mutual comparison of expressiveness of pcCFL, transitive 
pcCFL, shuftie CFL and trace CFL, proving them all pairwise incomparable (except for the trivial inclusion of 
transitive P cCFL in P cCFL, that we prove to be strict). Note that incomparability with respect to languages 
implies incomparability with respect to bisimulation or other equivalences. As one of the tools we 
formulate and prove pumping lemmas for classes pcCFL, transitive P cCFL and shuffle CFL. This provides an 
elegant completion of known pumping properties of regular, context-free and commutative context-free 
languages. 

Technically, the most difficult part is Sections [6] and [7J On the other hand, the results of Sections [2] 
and[4]confirm clearly that P cCFL is a natural class of languages extending CFL, with good algorithmic and 
closure properties. 

Yet another relevant language class is that defined by so called Dynamic Pushdown Networks El- 
The class extends CFL and is closed under shuffle. We do not investigate this class here, but we conjecture 
that it is incomparable with P cCFL. 



In fact, BPA and BPP define CFL and cCFL not containing the empty word, respectively. 
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Some of the proofs are omitted due to space limitation. 

2 Preliminaries 

By an interleaving of two words w and v, of length m and n, respectively, we mean any word u of length 
m + n such that its positions / = { 1 , . . . , m + n} may be split into two disjoint sets I w and I v such that u 
restricted to I w equals w and u restricted to l v equals v. Let w\\v denote the set of all the interleavings of 
w and v, which is clearly a finite set. By a shuffle of two languages L and K we mean 

L\\K= U w\\v. 

w€L,v£K 

Partially-commutative context-free languages. The class of languages to be defined below has been 
introduced in [7 ], however our presentation and terminology here is different. 

A Greibach context-free grammar consists of a finite alphabet, a finite set of non-terminal symbols 
V with a distinguished initial symbol SgV and a finite set of productions of the form 

X^a, (1) 

where X G V, a G V* and a is an alphabet letter. Additionally we assume that a grammar is always 
equipped with a symmetric and irreflexive relation / CCxV called the independence relation. For 
convenience we also use the complement D = (V xV)\I, called the dependence relation. Two non- 
terminals X,Y G V are called independent if (X,Y) El, and otherwise dependent. 

Any a £ V* we call a configuration. A derivation is a sequence of configurations such that every 
configuration is obtained from the preceding one via a step and the last one is the empty configuration. 
There are two kinds of steps: 

• production step: Xj5 a/3, for a production X a; 

• swap step: aXYfi — > aFX/3, where X and Y are independent. 

Every derivation defines a word w obtained by concatenation of alphabet letters occurring in the produc- 
tion steps. We write a — > /3 if there is a derivation that defines w, starts in a and ends in j3. We usually 
assume that a derivation starts with a configuration consisting of a single non-terminal, say X. If X — > e 
then we say that X generates w. Note that the length of w is the same as the number of production steps 
performed in any derivation that defines w. We assume wlog. that every non-terminal X generates some 
word. 

The language generated by a grammar is the set of all words generated by the initial non-terminal. 
The class of all so generated languages we call partially-commutative context-free languages ( P cCFL) Q. 
It clearly contains all context-free languages (CFL) and commutative context-free language^] (cCFL) Q. 
These two subclasses are special cases, where independence is either the identity, or the full relation, 
respectively. 

Example 1. For illustration, consider the grammar: 

P WBCB W -A WBC B £ B £ 

W -A C C^£ C^£ 

6 The commutative context-free languages are also called BPP languages. 
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The initial non-terminal is P and the independence relation is the symmetric closure of {B,B} x 
{C,C}. Here is an example derivation of the word adbbcc. 

P WBCB CfiCfi — ► CfifiC — ► BCBC — > 

BBCC BCC CC -A C £ 

In a similar way a word a n ab n bcc n is generated, for any w > 1, but also a n acc n b n b or a n acb n c"b. The 
language generated is 

(J a"a(fc n S || cc"). 

We might have defined configurations as Mazurkiewicz traces ifTBl rather than words over non- 
terminals (like in 0). This would mean that trace equivalent configurations are not distinguished. In our 
terminology, two configurations are trace equivalent when one may be transformed into another using 
solely swap steps. It is our deliberate choice to keep the swap steps explicit. 

Transitive dependence. We distinguish a subclass of pcCFL where dependence is assumed to be transi- 
tive, being thus an equivalence. This subclass we name p C CFL. Equivalence classes of dependence will 
be called threads. 

In Example [I] the dependence is not transitive, as it contains (P,B) and (P,C) but not (B,C). In fact 
we show later that this language does not belong to p C CFL. Both CFL and cCFL are strict subclasses of 
p'cCFL. 

Example 2. As an illustration, consider the language generated by: 

S -A £ S -A SA A -^ A' A' e 

S SB B B' B' £ 

with initial non-terminal S and the threads {S,A,B}, {A'} and {B'}. Here is an example derivation 
of the word absccab. 

S^SA^SBA^BA^ B'A — ► AB' ^A'B' -A B' A £. 

The language contains words of the form wsv, where w contains only a and b and v contains only a, b and 
c. Writing # a (w) for the number of occurrences of a in w and \w\ for the length of w, we may characterize 
the language by the following conditions: 

• *a(w) = * a (v), # b (w) = %{v) and # c (v) = %(v) +*b(v), 

• any prefix V of v and any suffix w' of w such that # c (v') = |w'| fulfills 

#a{w') > # fl (v') and *b(W) > *b{v'). 

Automaton model. A multi-pushdown automaton is like a single-pushdown one. In a single step one 
symbol is popped from one of the stacksH and a number of symbols are pushed on the stacks. The 



7 If we allowed for popping from more than one stack at a time, the model would clearly become Turing-complete, even 
with only one state. 
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number of stacks is fixed for an automaton. Assume there is only one state, or equivalently no state, and 
k stacks. Then a transition of an automaton is of the form: 

X-^a x ...a k , (2) 

to mean that when an automaton reads a, it pops X and pushes the sequence of symbols «/ on the ith 
stack, for i = 1 . . . k. Observe that wlog. one may assume that stack alphabets are disjoint. The following 
result is an easy observation: 

Theorem 1 (B8J) The p C CFL class is expressively equivalent to stateless multi-pushdown automata. 

Indeed, an equivalence class of configurations with respect to trace equivalence is represented by a tuple 
of strings, one per thread. Similarly, a production X — > a is represented, up to swap steps, exactly as 
in (|2]), with a,- being the projection of a on the ith thread. 

Similarly, one could also define an operational model for general P cCFL, with a stack replaced by a 
partially ordered structure. 

3 Derivation trees 

It is very convenient to use derivation trees instead of derivations themselves. However it is not com- 
pletely obvious how to define this notion in presence of commutativity of non-terminals. Below we adopt 
an intuitive approach using colors. 

Fix a derivation X — > e. Clearly a configuration is a sequence of non-terminal occurrences. We 
assume that every non-terminal occurrence in a derivation will be colored, including the occurrence of X 
in the initial configuration. We impose the following simple discipline of coloring: 

• if a swap step aXYp — > aYXfi is performed, every non-terminal occurrence in the right-hand 
side configuration inherits its color from the corresponding occurrence of the same non-terminal 
on the left-hand side. 

• if a production step Xfi — > a/3 is performed, the non-terminal occurrences in /3 preserve their 
colors, while all the non-terminals occurrences in a get fresh colors. Note that the color of the 
occurrence of X in the beginning of Xfi disappears as a result of the step. We say that this disap- 
pearing color drops the fresh colors. 

Intuitively, a color is intended to represent the 'life cycle' of one occurrence of a non-terminal during 
a derivation. Observe that non-terminal occurrences in a given configuration are always labeled with 
different colors, and that the total number of colors used in a derivation equals the number of production 
steps. 

Example 3. A disciplined coloring of the derivation from Example [2]is shown below. Colors are 1,2,... 
and the coloring is denoted by subscripts. 

Si -A S 2 A 3 A S 4 B 5 A 3 -A B 5 A 3 B' 6 A 3 — > A 3 B' 6 A' 7 B' 6 A^Ae. (3) 
Color 1 drops colors 2 and 3, color 3 drops color 7, etc. 

With the use of our coloring discipline, every derivation induces naturally a tree. The tree nodes 
are all colors appearing in the derivation. The color c\ is a parent of c 2 precisely if c\ drops c 2 . Every 
tree node c is labeled by a non-terminal. If convenient, one may think that every node is labeled by a 
production that made color c disappear. 
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There may be many different derivations inducing the same tree. Even worse, two derivations of 
different words may induce the same tree, as shown in the example below. 

Example 4. Continuing the last example, the derivation ([3]) induces the following tree: 

1 : S -A SA 



2:S^SB 3:Ay^A' 

4:S^e 5:B^B' 7:A'-Ue 

6:B'^e 

However, exactly the same tree is induced by the derivation: 

Si -A S 2 A 3 A S 4 B 5 A 3 -A B 5 A 3 B' 6 A 3 AAj^Ae 

of a different word abscbca ^ absccab. Intuitively, the words defined by subtrees rooted in 3 and 6, 
namely ca and b respectively, this time come in a different order. In fact all the interleavings of these two 
words are allowed. 

Useful properties. The examples confirm that our notion of derivation tree is more complex than 
the classical one. However, trees may be still very useful for reasoning about partially-commutative 
context-free languages, as they immediately bring to light the following useful properties: 

Induced SUBWORD. Given a derivation tree of a word w, every node c induces a subword (i.e. a 
subsequence but not an infix in general) of w. Indeed, the subword is obtained by concatenating only 
those letters from w whose color, as a tree node, belongs to the subtree rooted in c. We implicitly assign 
here to the letter of every production step a color that disappears in this step. For instance, for both words 
considered in the last example, the subword induced by the node 2 is bscb. Analogously one defines the 
subword induced by a subset of nodes of a derivation tree, assuming this subset to be an antichain with 
respect to the tree ancestor relation. 

Infix rearrangement. The induced subword may be rearranged into an infix. Let L e p^cfl and 
let v be the subword of w G L induced by a tree node c. Clearly, w € v || u, i.e., v is interleaved with the 
remaining subword u of w. Then u may be split into u = uiu 2 so that u\vu 2 G L. Indeed, let ui be the 
prefix of w preceding the first letter of v. In any derivation, after u\, the non- terminal that labels c is 
clearly active. Performing the whole derivation X — > e immediately after m does the job. 

SUBSTITUTIVITY. In any derivation tree, one may replace a subtree rooted in a node c by an arbitrary 
derivation tree t, assumed that both c and the root of t are labeled with the same non-terminal. The 
resulting tree is clearly induced by some derivation too. 

Membership problem. A derivation tree is of linear size in terms of the length of the word, which is 
useful for easily obtaining the upper bound for the membership problem, where given a word w and a 
presentation of a language L, one asks if w G LI 

Theorem 2 The membership problem is NF '-complete both for pcCFL and "cCFL. 

NP-hardness follows easily from NP-hardness of the membership problem for cCFL, shown in |9]. The 
NP upper bound one obtains easily: guess a tree and the order of its nodes, and then check in polynomial 
time whether the tree is induced by some derivation of the given word that respects the order of nodes. 
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4 Closure properties 

In this section we argue that P cCFL and p r c CFL classes are closed under union and shuffle, and P cCFL is 
closed under concatenation while p C CFL is not. Then we show that P cCFL is closed under homomorphic 
images and substitutions. In case of p C CFL we do not know the answer, however we suppose it is negative. 
Finally, we show that both classes lack closure under inverse homomorphic images and intersections with 
regular languages. 

Comparing P cCFL with CFL, roughly speaking, one sacrifices intersection with regular languages and 
inverse homomorphic images but one gains shuffle. Even if at first sight the properties listed above do 
not seem exciting, one should remember that both the classes considered here subsume also commutative 
context-free languages cCFL. Knowing that cCFL lacks closure under concatenation and homomorphic 
images, as shown in 0, it seems that with P cCFL one retrieves these relevant closure properties. This 
seems to confirm that P cCFL is a natural class of languages. 

Union and complement. Both classes are closed under union and the construction is entirely standard. 
On the other hand none of the classes is closed under complement. 

Shuffle and concatenation. Both classes are closed under shuffle and the construction of a grammar for 
the shuffle L\ \ \ Lq is easy. Wlog assume that the grammars that generate the two languages use distinct 
non-terminals. Let S\ and S2 be the initial non-terminals. Consider the union of grammars extended with 
one additional initial non-terminal S. Add additional productions 

S «iS 2 S^h a 2 S 1 (4) 

for any production Si — U- OCi or S2 OC2. Finally, extend independence by imposing that whenever two 
non-terminals come from different grammars they are independent. This clearly preserves transitivity of 
dependence. 

In P cCFL, concatenation Z4L2 is obtained similarly as shuffle. The only difference is that two non- 
terminals coming from different grammars are always declared dependent, and that only the left-hand 
productions in Q are added. Note that concatenation is in our setting no more natural than shuffle. 

pcCFL is not closed under concatenation, which one shows similarly as for cCFL [5]. Consider 
L\ = {w : # a {w) = #b(w) = # c (w) > l,#^(w) = 0} and L2 = {d}. In the derivation of some w G L1L2 
a configuration is necessarily reached with at least two different threads nonempty, as otherwise the 
language would be context-free. Thus the remaining suffix of w is some shuffle of at least two words 
generated by these non-empty threads, and only one of these words ends with d. If that subword is 
generated first, the whole word is not in Z4L2, which proves that L1L2 may not belong to cCFL. 

Homomorphic images and substitutions. As we consider only Greibach grammars, the empty word 
never belongs to a partially-commutative context-free language. Thus it is natural to consider only ho- 
momorphisms h that do not contain the empty word in the image: h(a) ^ e for all letters a. Below we 
show that pcCFL is closed under images of such homomorphisms. For p r c CFL the question is still open; we 
conjecture however a negative answer. 

We prefer to show a slightly stronger result: P cCFL is closed under substitutions. A substitution 
s assigns to each alphabet letter a a language s(a) 6 P cCFL. Similarly as above, we assume that the 
languages s(a) do not contain the empty word. For a language L, the substitution L[s) contains all words 
that may be obtained from a word in L, by replacing each letter a with any word from s(a). 

Assume a language L € P c CFL, generated by a grammar G, and a substitution s. Thus each language 
s(a) has its generating grammar G a . We describe the construction of the grammar G' for L[s]. The non- 
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terminals of G' will be the union of non-terminals of G and all grammars G a . Wlog we assume that the 
non-terminal sets are disjoint. 

Consider an arbitrary production X — > a in G. Let S a be the initial non-terminal in G a . For any 

production S a — — >■ j8 in G a , we add to G' the production: X — — >■ j6a. The independence in G' is defined as 
the set- theoretic union of independence relations of grammars G and G a . Thus any pair of non-terminals 
coming from different grammars is declared dependent (note that this is not achievable if the dependence 
has to be transitive). 

The construction guarantees that G' generates exactly L[s]. Indeed, once a production X f5a is 
fired, the non-terminals of G a block activity of other non-terminals, due to the dependence, until a word 
of s(a) is generated. 

We do not know whether the p C CFL class is closed under homomorphic images; however we suppose 
it is not. We conjecture that a counterexample is given by the language 

L = {w: # a (w) = # h (w) = #c(w),# d (w) = 1} 

together with the homomoiphism h(a) = a, h(b) = b, h(c) = c, h{d) = dd. 

Intersection with regular languages. Both classes P c CFL and p C CFL lack closure under intersection with 
regular languages. Let L = {w : # a (w) = %(w) = # c (w)}. Clearly L G cCFL but LC\a*b*c* is not in P cCFL 
(and also not in shuffle CFL defined in a moment) according to: 

Lemma 1 The language L = {a"b n c n : n > 1} is not in P cCFL U S hufHcCFL. 

It is worth noting that the lack of closure is not surprising as the emptiness problem for intersection 
of a partially-commutative context-free language with a regular language is undecidable, even if the 
dependence is assumed to be transitive. Roughly speaking p' c CFL correspond to stateless multi-pushdown 
automata and intersection with regular language corresponds do adding the state which makes the model 
Turing powerful. 

Inverse homomorphic images. Both P cCFL and p r c CFL are not closed under inverse homomorphic 
images. Consider the shuffle L = L\ \ \L,2 of two context-free languages 

Lj = {A" +i SB"T : n > 1} L 2 = {SB"TC n :n>\}, 

and the homomorphism h given by h{a) = A, h(s) = SS, h(b) = BB, h(t) = TT and h(c) = C. If h~ l {L) = 
{a" +1 sb n tc n : n > 1} were in P cCFL then its image under a homomorphism g(s) = b,g(t) = c, that is the 
language L in Lemma [TJ would be in P cCFL as well - a contradiction. 



5 Other extensions of context-free languages 

There are two other language classes know from the literature that, similarly as P c CFL, extend CFL with 
some amount of commutation. 

PA languages. The formalism to be described below is traditionally called Process Algebra (PA) lf2l 1X21 . 
It is however nothing else than an extension of Greibach context-free grammars with an explicit shuffle 
operation: a production has the form 

where t is an arbitrary term built from non-terminals using binary operations of sequential composition 
';' and parallel composition ' || '. The first operation one may inteipret as concatenation of languages, 
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and the second one as shuffle (thus the overloading of the symbol 1 1 is absolutely deliberate). The empty 
term e is also allowed. 

For convenience, terms are only considered up to a structural equivalence, that imposes associativity 
of both operations, commutativity of 1 1 , and neutrality of e with respect to both operations. 

A configuration is an arbitrary term of the above form. Steps between configurations are defined by 
the following rules (the last rule is in fact redundant due to commutativity of 1 1 , but we prefer to keep it 
for readability): 

, r a . * . (A t (At (At 

X — > t is a production t — > t t — > t u — > u 

ir a a I \\ a l\\ II (A 11/ 

A > t f,U >t,U t\\U >t'\\U t\\U >t\\U 

As usual, a derivation is a sequence of configurations starting from a distinguished initial configuration 5, 
ending in the empty configuration, such that every subsequent configuration is obtained from a preceding 
one by a single step. Other notions, including the language generated by a grammar, or derivation trees, 
may be defined similarly as for P cCFL. The class of languages we denote by shuffle CFL. 

In particular, shuffl eCFL satisfy the three properties mentioned above: Induced subword, Infix 

REARRANGEMENT and SUBSTITUTIVITY. 

The difference between P cCFL and shuffleCFL is, roughly, a difference between specifying commutation 
explicitly in productions, or implicitly by an independence relation. 

Trace-closures of CFL. To define trace CFL we need to assume that an independence relation ranges not 
over non-terminals but over alphabet letters instead. As usual, one defines trace equivalence over words: 
two words are equivalent if one may be transformed into another by swaps of neighboring independent 
letters. A context-free language L is not closed under this equivalence in general and its trace closure 

{w : w is trace equivalent to some v G L} 

is in general not context-free. By trace CFL we denote the class containing trace closures of context-free 
languages. Clearly trace CFL is a superclass of CFL. 

6 Pumping lemmas 

Now we analyze how much the classical idea of pumping extends from CFL to larger classes. Roughly 
speaking, the intuitive cutting and pasting in a derivation tree does not translate to the property of a 
language as easily as in the case of CFL. 

We formulate two different pumping lemmas. Remarkably, with one of them we complete nicely the 
picture of pumping lemmas known for regular, context-free and commutative context-free languages. 

As expected, the pumping lemmas appear to be useful tool for relating the expressive power of 
language classes, as we demonstrate in Section [7] 

The pumping lemmas. The length of a word w is written \w\. To motivate our conditions we start by 
recalling the pumping scheme proposed for cCFL by (H. 

(cCFL-PUMPING Q) There is a constant N such that ifw £ L with \w\ > N then there exist 
words x,y, s such that 

1. w G x (s 1 1 y), 

2. 1 < \s\ < N, and 
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3. Vm > 0, x s™y € LQ 

Point 1 reads as: w is a concatenation of some prefix x and an interleaving of s and y. We define now two 
new conditions on a language L. 

(SHUFFLE PUMPING) There is a constant N such that ifw € L with \w\ > N then there exist 
words x,y,z,s,t such that 

1. W €*((,s(y||0)||z), 

2. 1 < <N, and 

3. \/m>0,xs m yt m zeL. 

Point 1 reads as: there is some subword / of w with w € x(y' \ \ z) and y' G s (y \ \ t). 

(CONCAT. PUMPING) There is a constant N such that ifw £ L with \w\ > N then there exist 
words x,y,z,s,t such that 

1. w = xyz, 

2. 1 < \st\ < N, and 

3. ym>0,xs m yt m zeL. 

Call the words s, t repeatable words. The difference between the two conditions concentrates on the 
word y that separates the repeatable words in xs m yt m z- On one hand SHUFFLE PUMPING seems weaker 
as y is no more an infix of w, but an arbitrary subword (subsequence). On the other hand SHUFFLE 
PUMPING seems stronger as the length of y is bounded. 

Lemma 2 Every language L £ P cCFL U S h U fHeCFL satisfies SHUFFLE PUMPING. 

As an example of application we provide now a proof missing in Section[4] 

Proof of Lemma [!} Assume towards contradiction that L = {a n b n c n :n> 1} is in P cCFL or in AuffleCFL 
and apply Lemma [2] Observe that the two repeatable words s and t have necessarily jointly the same 
number of letters a, b and c. Thus one of them has to contains two different letters. Repeating this word 
twice leads to a contradiction. □ 

Lemma 3 Every language L £ p C CFL U shufncCFL satisfies CONCAT. PUMPING. 

Class pcCFL does not satisfy CONCAT. PUMPING, as witnessed by the language from Example [T] More- 
over in CONCAT. PUMPING one can not bound the length of the word y. 

Relating conditions. The condition SHUFFLE PUMPING is similar to the classical context-free pumping 
- the only difference is the words s, y, t and z are subwords, not necessarily infixes, of w. We claim it is 
an elegant completion of the pumping lemmas for regular languages (RL), context-free languages (CFL) 
and commutative context-free languages (cCFL) (see 0). All of these lemmas may be characterized by 
the following two characteristics: 

1 . Are there one or two pumping positions? 

2. Are repeatable words infixes or subwords a given word? 
The known pumping lemmas have the following characteristics: 

• RL: 1 pumping position, a repeatable word is an infix 

8 In fact in |5|, the pumping scheme was xs m y', with a suffix / of w (think of / e s\\y), rather than xs'"y. The proofs of 
both are very similar. We discuss this issue further in RemarkfT] 
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• CFL: 2 pumping positions, repeatable words are infixes 

• cCFL: 1 pumping position, a repeatable word is a sub word Q. 

In this light, our condition SHUFFLE PUMPING offers an elegant completion of the picture: 2 pumping 
positions, repeatable words are subwords. In other words, SHUFFLE PUMPING weakens cCFL -pumping 
in the same way as CFL -pumping weakens RL-pumping (2 pumping positions instead of one). The other 
way around: SHUFFLE PUMPING weakens CFL -pumping in the same way as cCFL-pumping weakens RL- 
pumping (repeatable word is no more an infix). The relationships between the four pumping conditions 
is depicted in the following diagram: 



SHUFFLE PUMPING 



CFL-pumping 

X 




two pumping v ✓ RL-pumping - , ' rep eatable subword 
positions 1 



one pumping v / . . 1 ~ 

... repeatable infix 

position r 

Remark 1 It is worth mentioning that another pumping scheme could be used in place of SHUFFLE 
PUMPING in Lemma^ instead ofxs m yt m z, one may consider 

xs m y't m z, 

with w G X (y' 1 1 z) and y' G (s (y \ \ t)). The proof would be very similar. 



7 Expressiveness 

Now we are ready to compare the expressive power of p C CFL and P cCFL with other classes. We show that 
pcCFL is a strict subclass of pcCFL and that both shuffle CFL and trace CFL are incomparable with either pcCFL or 
pcCFL. More specifically, our results are as follows: 

Theorem 3 p r c CFL is a strict subclass o/VCFL. 

Theorem 4 The following non-inclusions hold: 

(1) pcCFL n shuffle CFL is not included Ifl trace CFL. 

(2) pcCFL n trace CFL is not included in shuffle CFL. 

Theorem 5 The following non-inclusions hold: 

(1) p^CFL is not included in shuffle CFL U trace CFL; 

(2) shuffle CFL is not included in pcCFL U trace CFL; 

(3) trace CFL is not included in pcCFL U shuffle CFL. 
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The proofs of the results are by identifying witnessing languages L\ ... as illustrated in Figure[TJ The 
pumping lemmas, namely Lemma [3] and Lemma [2] are sufficient to prove Theorem [3] and Theorem |5j3), 
respectively. On the other hand they are not sufficient for Theorem[4j2) and Theorem|5jl)-(2), as shumeCFL 
satisfies both the lemmas, and thus we have to perform a more delicate analysis of a derivation tree. We 
illustrate the first kind of argument in the proof of Theorem |5j3) and the second kind in the proof of 
Theorem |5j2) below. 




Figure 1 : Relating the expressive power. 
Proof of Theorem |5j3). Consider the language 

^6 = {w € [J ( a" add" \ \ b"c n j : every b preceds every d and d in w}. (5) 

/!>() 

Clearly, L(> is the trace closure of the context-free language {(ab) n ad(cd) n : n > 0}, if for the indepen- 
dence on alphabet letters one chooses the symmetric closure of: 

{a, a} x {b,c} U {d,d} x {c}. 

Using Lemma[2]we will show that belongs to neither P cCFL nor S h U fHeCFL. Consider a word 

w n = a n db n c n dd n 

and recall that for n larger that N of Lemma [2] we would obtain 

w ;1 € * (/ 1 1 z) y'es(y\\t) 

for a substring y' of w n . Recall also the pumping scheme of SHUFFLE PUMPING from Lemma [2} 

xs m yt m z£L 6 , form>0. (6) 

We do a sequence of simple observations. First, to keep the same number of appearances of letters a,b,c 
and d, each of the four letters must appear either in s or t. Second, both s and t are necessarily non-empty 
as otherwise we would observe an illegal order of letters in ([6]), and moreover a and b occur in s and c 
and d occur in t, keeping in mind that in every a precedes every d and every b precedes every c and 
d. Third, the length of the prefix x is at most n, as otherwise both s and t would appear to the right of a 
and thus could not contain a. Thus, x contains only a. Now, d is not in x, cannot be in s or t, and cannot 
be in z since otherwise (s and) t could not contain d. Therefore d is in y, and z contains no b. As neither 
x nor z contains b, and w n G x(y' \\ z), y' must contain n occurrences of b, but |/| = \syt\ < ./V, hence this 
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is not possible. We have thus shown that does not satisfy Lemma [2] and therefore it does not belong 

tO pcCFL U shuffle CFL. □ 

Proof of Theorem [4J2). Consider the language L 3 G £ C CFL: 

L,= {ja n s(b n \\c n ) (7) 

n>0 

and a grammar that generates the language: 

S SP P C C 

The initial non-terminal is 5 and the threads are {S,P},{B},{C}. L3 also belongs to trace CFL aS it IS 
the trace closure of the context-free language {a"s(bc) n : n > 0} with independence {(b,c), {c,b)}. 

It remains thus to show that L3 ^ shuffle CFL. Intuitively, the idea is to show that L3 cannot benefit from 
parallel composition. 

Assume that L3 G shufaeCFL, aiming at deducing a contradiction. Fix a grammar that generates L3. For 
simplicity think of the productions of the following form (the first two we will call sequential): 

X-^e X-^Y;Z X-^Y\\Z. 

We will exploit the property that s divides every word in L3 into two separated regions. We partition the 
non-terminals into symbols that generate some word containing s, and symbols that do not; and call them 
s-symbols and non-s-symbols, respectively. By SUBSTITUTIVITY, each word generated by an ^-symbol 
contains necessarily s. 

Consider a derivation tree T of a word wsv G L3. The unique path leading from the root to the leaf 
labeled by s call the spine. Observe that an ^-symbol may only appear on the spine and a non-s-symbol 
may only appear outside the spine. Knowing that the number of occurrences of a and b on both sides of 
the spine is the same, we deduce that 

each production labeling a node of the spine is necessarily sequential. (8) 

Indeed, assume a parallel production X — > Y\\Z labels a node of the spine. Wlog. let Y be a ^-symbol. 
Let u, u' be the subwords induced by the F-node and Z-node, respectively. Clearly there are two inter- 
leavings of u and u' such that the letter s, appearing in u, is placed in the interleaving in two different 
positions in the word u' . Thus at least one of these interleavings must lead to a violation of the condi- 
tion (|7]) in a word belonging to L3. Condition ([8]> is thus proved. 

Now consider a non-s- symbol X appearing in T . The number of occurrences * a {u) of a in all words 
u generated by X is necessarily the same, and the same applies to %{u) and # c {u). Indeed, otherwise one 
gets a similar contradiction as above by considering two words induced by the X node, differing in the 
number of occurrences of a or b, and using SUBSTITUTIVITY. As a consequence X generates a finite 
language which may clearly be defined by a context-free grammar, say Gx- 

If we apply the last observation to the very first non-.s- symbol X on every path in T (except the 
spine), we obtain a tree without parallel nodes. As Gx does not depend on the particular derivation tree 
T chosen, and the word wsv G L3 was chosen arbitrary, we conclude that L3 is generated by a context-free 
grammar. The grammar is obtained by replacing productions of every non-s-symbol X in G with Gx- As 
L is clearly not context-free we obtain a contradiction and thus complete the proof. □ 
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